智能论文笔记

本文致力于检测地球图像森林和非林区的问题。我们提出了两个统计方法来解决这个问题：一个基于多假设检测与参数分布家庭，另一个在非参数测试。参数化方法是文献中的新颖，与更大类别的问题相关 - 检测天然对象，以及异常检测。我们为两种方法中的每一种开发数学背景，使用它们构建自充足检测算法，并讨论其实现的数值方面。我们还将我们的算法与使用卫星数据的标准机器学习的算法进行比较。

translated by 谷歌翻译

Transformers learn in-context by gradient descent

Johannes von Oswald , Eyvind Niklasson , Ettore Randazzo , João Sacramento , Alexander Mordvintsev , Andrey Zhmoginov , Max Vladymyrov

分类：机器学习 | 人工智能 | 自然语言处理

2022-12-15

Transformers have become the state-of-the-art neural network architecture across numerous domains of machine learning. This is partly due to their celebrated ability to transfer and to learn in-context based on few examples. Nevertheless, the mechanisms by which Transformers become in-context learners are not well understood and remain mostly an intuition. Here, we argue that training Transformers on auto-regressive tasks can be closely related to well-known gradient-based meta-learning formulations. We start by providing a simple weight construction that shows the equivalence of data transformations induced by 1) a single linear self-attention layer and by 2) gradient-descent (GD) on a regression loss. Motivated by that construction, we show empirically that when training self-attention-only Transformers on simple regression tasks either the models learned by GD and Transformers show great similarity or, remarkably, the weights found by optimization match the construction. Thus we show how trained Transformers implement gradient descent in their forward pass. This allows us, at least in the domain of regression problems, to mechanistically understand the inner workings of optimized Transformers that learn in-context. Furthermore, we identify how Transformers surpass plain gradient descent by an iterative curvature correction and learn linear models on deep data representations to solve non-linear regression tasks. Finally, we discuss intriguing parallels to a mechanism identified to be crucial for in-context learning termed induction-head (Olsson et al., 2022) and show how it could be understood as a specific case of in-context learning by gradient descent learning within Transformers.

translated by 谷歌翻译

投资专业人员依靠将公司收入推送到未来（即收入预测）来近似规模的估值（高增长阶段的私人公司）并为他们的投资决定提供了信息。这项任务是手动和经验性的，使预测质量在很大程度上取决于投资专业人员的经验和见解。此外，关于规模的财务数据通常是专有，昂贵和稀缺的，排除了广泛采用数据驱动的方法。为此，我们提出了一种模拟的收入外推（SIRE）算法，该算法在小型数据集和短时间序列上产生精细颗粒的长期收入预测。父亲将收入动力学建模为线性动力学系统（LDS），该系统使用EM算法解决。主要的创新在于如何在培训和推论过程中获得嘈杂的收入测量。 Sire为在各个部门运作并提供置信度估计的规模工作。关于两项实际任务的定量实验表明，父亲大大超过了基线方法。当父亲从短时间序列中推断出来并长期预测时，我们还会观察到高性能。绩效效率的平衡和结果的解释性也得到了经验验证。从投资专业人员的角度进行评估，父亲可以精确地找到在2至5年内具有巨大潜在回报的规模。此外，我们的定性检查说明了父亲收入预测的一些有利属性。

translated by 谷歌翻译

我们使用高度紧凑的模型研究基于示例的程序纹理合成问题。考虑到样本图像，我们使用可微分的编程来训练由经常性神经蜂窝自动机（NCA）规则的参数化的生成过程。与普通的信念相反，神经网络应该显着过度参数，我们证明了我们的模型架构和培训过程允许仅使用几百人学习参数表示复杂的纹理模式，使其具有与手工工程的程序纹理生成程序相当的表达效力。从提议的$ \ mu $ nca家族的最小模型降至68个参数。当每个参数使用量化到一个字节时，所提出的模型可以缩小到588和68字节之间的尺寸范围。只有几行GLSL或C代码，可以实现使用这些参数来产生图像的纹理生成器的实现。

translated by 谷歌翻译